A chromosome-level genome assembly of the pollinating fig wasp Valisia javana

Abstract Fig wasp has always been thought the species-specific pollinator for their host fig (Moraceae, Ficus) and constitute a model system with its host to study co-evolution and co-speciation. The availability of a high-quality genome will help to further reveal the mechanisms underlying these characteristics. Here, we present a high-quality chromosome-level genome for Valisa javana developed by a combination of PacBio long-read and Illumina short-read. The assembled genome size is 296.34 Mb from 13 contigs with a contig N50 length of 26.76 kb. Comparative genomic analysis revealed expanded and positively selected genes related to biological features that aid fig wasps living in syconium of its highly specific host. Protein-coding genes associated with chemosensory, detoxification and venom genes were identified. Several differentially expressed genes in transcriptome data of V. javana between odor-stimulated samples and the controls have been identified in some olfactory signal transduction pathways, e.g. olfactory transduction, cAMP, cGMP-PKG, Calcim, Ras and Rap1. This study provides a valuable genomic resource for a fig wasp, and sheds insight into further revealing the mechanisms underlying their adaptive traits to their hosts in different places and co-speciation with their host.


Introduction
Mutualism is ubiquitous in nature and at the core of the ecosystem, which has a profound impact on evolution at all levels of the ecosystem. [1][2][3] Recently, specific mutualism, such as fig and fig wasp, yucca and yucca moth, has attracted extensive interest of researchers due to their high degree of interspecific adaptation and coevolution. 4,5 The fig (Moraceae; Ficus) is pantropical and each species is likely pollinated by one to two wasps from the chalcid family Agaonidae.
However, as many as nine pollinators can occur across a single host. 6  maintained, and a high degree of synchronization and strict consistency are formed in morphology, behavior, physiology and development between them. 8 Chemosensitivity mainly acts on the aspects of foraging, oviposition, mating, avoiding natural enemies and searching for hosts of insects. 9 Fig wasps mainly rely on the accurate identification of volatile odors (VOCs) released from receptive syconium of their obligate host to find host. 10,11 Each Ficus species produces a specific volatile blend that is only attractive to its specific pollinators. 12,13 In addition to olfaction, other sensing, such as touch, taste and vision can also judge whether the host is suitable or not, 14 which is mainly involved in odorant binding protein (OBP), chemosensory protein (CSP), olfactory receptor (OR), ionic receptor (IR) and gustatory receptor (GR).
The formation of olfactory sensation involves a series of complex signal transduction pathways. OBP transports VOCs to OR, and then OR activates. 9 For insects, OR possess a distinct seventransmembrane topology with the amino terminus located intracellularly and lack homology to G-protein-coupled chemosensory receptors in vertebrates. 15,16 The activates of insects OR is either by the complex between it and its co-receptor (Orco) confers channel activity 16 or by G-protein-mediated activated second messenger systems, such as cAMP and cGMP. 17 Adult females of Nasonia vitripennis and other parasitic wasps in Chalcidoidea inject a venomous mixture into its host flies prior to oviposition to regulate host physiological processes, including immunity, developmental and metabolism, in order to protect and ensure offspring survival and successful development in the host. [18][19][20] Although fig wasps are pollinators, they are actually parasitic in female flower ovary of the host. The developments of them are synchronous with that of host inflorescence. 21,22 It has been reported that fig wasp inject venom into the ovary of female flowers to stimulate the production of endosperm to provide nutrition for the development of their eggs and larvae, and now the ovary is called gall. 23,24 A gall is a proliferation of tissue abnormal for a site and generated by physiological manipulation of the plant through the insect. 8 However, the venom gene of fig wasp has not been studied.
Highly species-specific lifestyle, behavior and living habit can be reflected at the genomic level. [25][26][27] Given it has a specific symbiotic relationship with the host for tens of millions of years, fig wasp is now emerging as a model system for comparative genomics to study co-evolution and co-speciation. [28][29][30] Three pollinating fig wasps, Cerotosolen solmi, 28 Eupristina verticillate 29 and Wiebesia pumilae, 11 have been sequenced at the genome level and annotated a large number genes about morphology, behavior, physiological and gender related, which were adapted to the highly specific symbiosis of fig wasps, and some gene families related to chemosensory, detoxification, innated immune response are reduced. There are more than 750 species of Ficus in the world. 31  Valisa javana is an obligate pollinator of a dioecious fig, Ficus hirta, which is widely distributed in Southeast Asia. Due to its typical asynchronous flowering phenology and located under the canopy, V. javana tends to pollinate within the population or even within a tree of its host, so its flight distance should be relatively limited and more likely to speciation by geographical isolation. The species complex occurred in F. hirta across Southeast Asia and eight of them are related. 6 In order to better understand the mechanism of speciation and differentiation of these related species, we need to have a general understanding of their genomics.
Here, we report a high-quality chromosome-level genome assembly for one of species of the complex, V. javana complex sp. 1, using a combination of PacBio long-read sequencing and Illumina shortread sequencing. The assembly has high completeness, providing an excellent genomic resource for subsequent research. In this study, besides the basic genome description and comparative genomics, we identified chemosensory genes, detoxification, venom genes and several differentially expressed genes (DEGs) in V. javana related to olfactory sensing. This genome assembly serves as a useful resource for further research into insect biology, pollinator-host interactions, comparative genomics and coevolution.

Genome sequencing
For a receptive male syconium of F. hirta, only one or a few female fig wasps can enter and oviposit there. 32 So, the pollinating wasps from one syconium are mainly the offspring of one or a few mothers. We have once checked the pollinator species of F. hirta in Guangdong Province by DNA sequencing and morphology based on plenty of samples, 6,33,34 and they are all V. javana complex sp. 1. 6 Nearly, 500-1,000 female adult individuals of V. javana were collected from several figs of F. hirta in a single tree in Baiyun Mountain, Guangdong Province of China for DNA extraction, libraries construction and whole-genome sequencing. DNA was extracted with Easy Pure Genomic DNA Extraction Kit (Beijing, China). 33 The paired-end libraries with insert size of $310 bp were constructed using VAHTS Universal DNA Library Prep Kit and then sequenced on an Illumina Hiseq4000 platform (San Diego, CA, USA). The raw Illumina reads were purified with Trimmomatic 0.39 35 and then used for downstream genome alignment. In addition, more than 10-15 lg of sheared and concentrated DNA was applied to sizeselection by BluePippin system. Long DNA fragments of $20 kb SMRTbell TM libraries were constructed with Sequel Sequencing Kit 2.0. A total of four single-molecule real-time cells were sequenced on Pacbio RSII system.

Assessment of the genome completeness and quality
The completeness of the genome was evaluated through estimating the genome size, BUSCO (Benchmarking Universal Single-Copy Orthologues) analysis and genome coverage calculation.
The clean data of Illumina reads were used for K-mer distribution analysis by GenomeScope2.0 version 1.0.0 41 for genome survey.
With K-mer size set to 21 and ploidy set to haplotype, the genome size was estimated to be 304 Mb.
BUSCO version 4.1.2 42 was used to evaluate the completeness of the contigs assembly and whole-genome proteins separately based on the hymenoptera_odb10 database which including 5,991 BUSCOs. Besides, we also performed BUSCO analysis using whole-genome proteins of 29 other insect species based on corresponding databases (endopterygota_odb10 for Coleoptera species, diptera_odb10 for Diptera, hymenoptera_odb10 for Hymenoptera and hemipter-a_odb10 for Hemiptera).
The genome coverage was calculated by mapping the Illumina reads to a reference genome using Bowtie2 version 2.3.5. 43

Repeat annotation
We detected repetitive sequences and transposable elements in the genome using a combination of de novo and homology-based approaches. Repetitive sequences appeared at least 16 times were searched by ab inito algorithm using RepeatModeler version 2.0.1. 44 The conserved transposons were searched by RepeatMasker version 4.1.0 45 according to the Insecta repeats within RepBase database. 46

Non-coding genes annotation
The non-coding genes including tRNAs, rRNAs, snRNAs and miRNAs were mainly annotated by aligning the genomic sequence against the covariance models (CMs) of RFAM database version 14.5 47 with Infernal version 1.1.2. 48 In contrast to only one CM of tRNA in Rfam database, more refined eukaryotic CMs were used by tRNAScan-SE version 2.0.2 49 for a more accurate prediction.

Gene prediction and functional annotation
Protein-coding genes were predicted by combing RNA sequencing (RNA-seq)-based, homolog-based and ab initio methods performed on the repeat masked genome. For the RNA-seq-based method, short reads from transcriptome sequencing were aligned to the genome with Hisat2 version 2.1.0 50 and the gene structures were built by Sam2transfrag, a module of GETA version 2.4.6 (https://github.com/ chenlianfu/geta, 22 May 2022, date last accessed). The homologous proteins from the whole genome of three fig wasp species, W. pumila, Ceratosolen solmsi, Eupristina verticillata, jewel wasp N. vitripennis, Copidosoma floridanum and fruit fly Drosophila melanogaster were employed to predict gene structures by genewise version 2.4.1. 51 Then the derived complete or partial gene models were used to train Augustus HMM (Hidden-Markov Model) parameters and predicted genes ab initio assisted by the hints of intron, CDS, exon, start and stop codon using AUGUSTUS version 3.3.3. 52 The results of three methods were integrated, and the gene models were filtered if it couldn't match to a HMM model from eukaryotic HMM database of eggNOG version 5.0. 53 The functional annotation of protein-coding genes was performed by aligning protein sequences to Nr, 54 Swiss-Prot, 55 KOG 56 and eggNOG version 5.0 53 using BLASTP method of Diamond 2.0.9.147. 57 Furthermore, the databases of InterPro version 85.0 58 and Pfam version 34.0 59 were used for gene family and domain annotation by Interproscan 5.51 58 and Hmmer 3.3.2 60 separately. The genes annotated from eggNOG and InterPro were integrated by Gene Ontology (GO). The KEGG (Kyoto Encyclopedia of Genes ad Genomes) annotation were employed by web tool KAAS. 61

RNA-Seq and analysis
In total, six RNA-Seq libraries were constructed, including three biological replicate samples of V. javana which were stimulated by odor emitted by B-phase male syconia of its host for at least 30 min and three biological replicates of controls. Approximate 50-60 female adult individuals for each sample were collected for RNA extraction, libraries construction and RNA-Seq. RNA was isolated using TRIzol TM (Tiangen). For each sample, a messenger RNA (mRNA)-Seq library was constructed using an Illumina TruSeq TM RNA Sample Preparation Kit (Illumina, San Diego, CA, USA) following the manufacturer's recommendations. The isolation of mRNA, fragment interruption, complementary DNA (cDNA) synthesis, adaptor ligation, PCR amplification and RNA-Seq were performed by Novogene Bioinformatics Technology Co., Ltd. (Beijing, China). The transcriptomes were sequenced using the Illumina HiSeq 2000 platform with paired-end libraries.
Low-quality reads were filtered using Trimmomatic version 0.38. 35 The RNA-seq clean reads were mapped to the V. javana genome using Hisat2 version 2.1.0. 50 Then the alignments were processed for gene counts calculation by HTseq version 0.11.2. 62 The raw counts of all samples were normalized across transfer matrix method (TMM) algorithm to get gene expressions. 63 After low expression genes filtered with threshold, the false discovery rate (FDR) values of genes were calculated by EdgeR 3.30.3 64 and DEseq2 1.28.1, 65 and then DEGs were identified. The analysis of GO and KEGG pathway enrichment for DEGs were performed by Fisher's exact test. One of stimulated samples was an outlier in almost all the pairwise comparisons and made the number of DEGs very low. Therefore, an additional analysis was done including only two stimulated samples.

Identification of ortholog and phylogenetic analysis
To find orthologous genes, genome the coding sequence (CDS) and protein sequences of 14 Hymenoptera species (including four fig wasp species, V. javana, W. pumilae, E. verticillata and C. solmsi), four Diptera species (including D. melanogaster), four Lepidoptera species, four Coleoptera species and four Hemiptera species were downloaded from NCBI (Supplementary Table S1). Pairwise comparison of protein sequences was made using BLASTP in Diamond 2.0.9.147 57 and filtered by in-house perl script. Orthologous gene pairs were found using OrthoMCL 2.0.9 66 and clustered by MCL algorithm. 67 Each orthologous cluster group (OCG) represents a gene family and contains genes from at least two species. Functional annotations including Nr, Swiss-Prot, KOG, eggNOG, Pfam, KAAS and GO were performed, and the OCG obtained responding functional annotations when !30% of its genes had the same annotation.
We reconstructed a phylogeny for 30 insects using the single-copy genes from OrthoMCL results. The protein sequences of every species in each OCG were independently aligned by MAFFT version 7.407. 68 These sequences were then transferred to CDS sequences using in-house perl script and retained the conserved blocks !60 bp using Gblocks 0.91 b. 69 At last, the aligned conserved codon sequences of 1,467 single-copy genes were concatenated to obtain the maximum likelihood (ML) tree using RAxML version 8.2.12 70 with bootstrap set to 300. The divergence time among species was calculated by MCMCTree in PAML version 4.9i. 71 Three calibration time points from TimeTree database, 72

Genome synteny
According to the topological structure of the phylogenetic tree, the evolutionary relationship between V. javana and W. pumila is the closest. So genome synteny between V. javana and W. pumila was detected by MCScanX 73 with default parameters and plotted using Circos version 0.69-6. 74

Genes under positive selection
In total, 16,098 OCGs present in at least 4 species were analyzed potentially experienced positive selection. First, codon alignment was transferred and an unrooted sub-tree was extracted from the phylogenetic tree for each OCG. Second, dN/dS was calculated for the pairwise comparison between all codon sequences of each OCG using in PAML version 4.9i 71 and choose the OCGs with dN/dS > 1.0 as candidate genes. Combining the union set of the above two algorithms, 66 candidate OCGs are obtained as candidate positive selection genes for the following analysis: for these genes, the subtree information containing only the corresponding species was extracted according to the phylogenetic tree. Then the positive selection analysis was carried out together with the results of codon multi sequence alignment. The branch-site model A

Gene family expansion and contraction
In this study, the Pfam gene families were used to analyze expansion and contraction by CAFE 4.2.1. 75 The global parameter k, which described both the gene birth (k) and death (l ¼ Àk) rate across all branches in the tree for all gene families, was estimated to be 0.00151763700272 using ML. The error model value which represented the error probability of gene family size was set to be 0.0125 through iterative computations by caferror.py in CAFE. A conditional P value was calculated for each gene family, and families with conditional P values < threshold (0.01) were considered as having an accelerated rate of gain or loss.

Gene family analysis
Five chemosensory gene families were focused on and identified through HMM models from Pfam A database, including GRs (PF06151 and PF08395), ORs (PF02949), IRs (PF00060), OBPs (PF01395), CSPs (PF03392). Then, we further manually annotated these genes in four fig wasp species by HMM model search (http://www.hmmer.org/, 22 May 2022, date last accessed) and Interproscan software screening. 76 The five types of chemosensory genes of D. melanogaster, A. mellifera and N. vitripennis have been fully studied, so we used them as outgroup [77][78][79][80][81] to construct phylogeny by ML method in FastTree v 2, 82 and the confidence of each branch was tested by bootstrap 1,000 times. According to Orthomcl software 66  Hymenoptera insects were also analyzed in this study. Seventy-nine venom protein sequences were reported in the model species of N. vitripennis, 83,84 of which 41 protein sequences can be downloaded in NCBI. We used these 41 protein sequences to blast venom genes in four fig wasp species and other ten Hymenoptera species (including N. vitripennis) and D. melanogaster. Venom proteins belong to secretory proteins, so we retained the protein sequences with signal peptide but without transmembrane region.
In addition, three detoxification gene families were focused on and identified through HMM models from Pfam A database, including Glutathione-S-transferases (GSTs; PF00043; PF02798; PF13409; PF13410; PF13417; PF14497; PF17171), cytochrome P450s (P450s; PF00067), and carboxylesterases (CCEs; PF00135). The three types of detoxification genes of A. mellifera and N. vitripennis as outgroup to construct phylogeny by ML method in FastTree v2, 82 and the confidence of each branch was tested by bootstrap 1,000 times.

Chromosome-level genome assembly
The genome of V. javana sequenced was generated 28 Fig. S1). After genome assembly of the PacBio reads and correction with Illumina reads, we obtained a reference of 296.34 Mb with contig N50 of 26.76 Mb (Supplementary Table S3 Table 1).
The assembly of V. javana genome is high-quality with 13 contigs assembled from genomes de novo (Fig. 1). We assessed the completeness of the assembly using BUSCO, and 5,516 out of 5,991 (92.1%) conserved Hymenoptera genes were found in the whole-genome nucleotide sequences of V. javana ( Supplementary  Fig. S2). In addition, we further compared the BUSCO integrity of the genome of other published fig wasp species, W. pumila, E. verticillata and C. solmi, and the results were all $92% ( Supplementary  Fig. S2). Although the genome assembly of W. pumila has reached chromosome level. This indicates that the Hymenoptera_ Odb10 provided by BUSCO is mainly constructed according to the protein sequences of some model species, such as A. mellifera and Acromyrmex echinatiord, so it may be not suitable for fig wasps.
The telomere sequence of animal genome is usually simple repeat sequence. By analyzing the genomic sequence of V. javana, we found there are three kinds of repeat units in telomere sequence: ATTGGGTT, ATCGGGTT and ATCTATTC (Supplementary Table  S4). The first two telomeres are located at one end of the chromosome, and the third telomere is located at the other end of the chromosome. Among the 13 contigs, 7 of them have 2 telomeres at 2 ends; 4 of them were found 1 telomere at the end suggesting a good integrity of this genome assembly.

Genome annotation
In total, 7.14% of the V. javana genome consists of repeat sequences, of which transposons are 0.61%. The main repeat is simple repeats with 4.02% of the genome (Supplementary Table S5 and Fig. 1). The rate of repetitive sequences in genome of V. javana is similar to that of two fig wasps, W. pumilae and C. solmi, and smaller than that of most insects of the remaining Hymenoptera and other orders (Supplementary Table S6).
Non-coding genes in eukaryotes mainly include rRNA, tRNA, small nuclear RNA and miRNA. In genome of V. javana, we annotated 62 microRNAs, 119 tRNAs, 31 rRNAs and 30 small nuclear RNAs (Supplementary Table S7). In addition to the above four types  of non-coding RNAs, 26 histone downstream element were found in the genome of V. javana, which is a stem-loop rich in purine in the untranslated region (UTR) region at the 3 0 end of histone. By combining with stem loop binding protein, the precursor of histone mRNA was processed to form a special stem loop at the tail of 3 0 end, and not the common PolyA tail. 85 We predicted 12,797 protein-coding genes in the genome of V. javana based on three lines of evidence. The protein-coding gene number of V. javana is similar to that of most Chalcidoidea wasps which are from NCBI, such as E. verticillata with 14,012 genes, N. vitripennis (parasitic wasp: 13,589), and C. floridanum (parasitic wasp: 12,136 genes; Supplementary Table S8). For V. javana, the length of gene, cDNA and CDS were 5,462, 1,915 and 1,002 bp, respectively. The median length of single exon and single intron were 214 and 412 bp, with an average of 5 exons per gene (Supplementary Table S8).
More than 80.53% of the predicted genes (10,306 genes) of V. javana have homology in public databases of Nr, Swissprot, KOG, EggNOG, InterPro, KEGG and GO (Supplementary Table S9).

Comparative genomics and phylogenetic reconstruction
We blasted 391,324 genes by blastp among V. javana and the other 29 insect species used in our analysis, of which 288,079 genes (37.51%) of all 30 species could be identified as orthologous genes among at least 2 species, and were divided into 27,219 OCGs. Among these OCGS, functions of 27,011 OCGs (99.23%) could be annotated. In genome of V. javana, 10,051 (70.12%) genes were distributed in 9,633 OCGS, 1,210 (8.44%) were identified as paralog and 3,072 (21.42%) were orphan genes.
The number of orthologous genes is from 2,448 to 2,827 across 30 insect species accounting for 14.58-25.03% of all their genome genes (Fig. 2). The number and proportion of orphan genes in different species vary greatly, whereas the number of orthologous genes in related species is very close.
The phylogenetic relationships between V. javana and the other 29 insect species were determined with a genome-wide set of 1,647 singlecopy genes (Fig. 2). As expected, V. javana has a closer relationship to three other fig wasp species than the other two species of chalcidoids. The six chalcidoids (V. javana, W. pumila, E. verticillata, C. solmsi, N. vitripennis and C. floridanum) cluster together, with eight other Hymenoptera insects as a sister group. Estimated divergence times of V. javana and W. pumila (calculated using mcmctree) suggest that V. javana diverged from the common ancestor of the other members in the family Agaonidae $39.22 Mya (28.1-50.7 Mya; Fig. 2). The lineage to which N. vitripennis belongs was estimated to have diverged from C. floridanum $120Mya, while 113 Mya was estimated for the divergence of the clade including N. vitripennis and C. solmsi from C. floridanum in. 86

Genome synteny
According to the topological structure of the phylogenetic tree, the evolutionary relationship between V. javana and W. pumila is the closest. So, synteny of the V. javana assembly was compared with W. pumila. The synteny analysis is based on the same distributed sequence of orthologous genes between two species. V. javana showed a high level of synteny with W. pumila which was consistent with their closer relationship in the phylogenetic tree (Fig. 3).
There are 8,800 genes (59.30% of the genome) in V. javana and 8,612 genes in W. pumila distributed in 168 synteny blocks, of which 53 belong to large synteny blocks (containing ! 50 homologous genes), of which the largest block contains 293 homologous genes. For V. javana, three groups of contig 1 and contig 6, contig 2 and contig 9, contig 3 and contig 5 are matched to Chromosomes 1, 4, 3 of W. pumila, respectively.

Genes under positive selection
A total of 27 and 25 PSGs were identified in V. javana and Agaonidae respectively from the 66 candidate positive selection genes (Supplementary Table S10). There are 14 PSGs shared between the clade of V. javana and Agaonidae, which are mainly related to signal transduction, genetic information processing, nervous system and endocrine system. For V. javana, the most significant PSGs (P ¼ 0) were innexin, LIM domain transcription factor and LIM domain protein. Innexin belongs to hexameric protein family and can constitute gap junctions to perform the function of intercellular communication. 87 LIM domain transcription factor and LIM domain protein play an important role in the development and regulation of perception and movement. 88 In Agaonidae, there is one PSG, pikachrin, which can help transmit electrical signals from retina to brain more quickly and efficiently, 89

Gene family expansion and contraction
When compared with W. pumila, 8 and 10 gene families were expanded and contracted in the V. javana genome according to Pfam blasting (Supplementary Table S11). Among the expanded Pfam, there are OR, OBP, trypsin (PF00089), Kazal-type serine protease inhibitor domain (PF00050, PF07648), G protein-coupled chemoreceptor protein (PF10328) and transcription factor (PF13909). The expansion of OR, OBP and G protein coupled chemoreceptor protein suggests that V. javana may have stronger odor perception than W. pumila. Trypsin is an important digestive protein in most insects, which is widely distributed in the digestive tract of insects of different orders and feeding habits. 90 The number of serine protease inhibitors in the genome of V. javana is as high as 235, which can inhibit the activity of serine protease and has antibacterial activity. 91 Serine protease inhibitor may exist in the venom protein of insects, 92 and protect their eggs. 93 The contraction gene family in V. javana is mainly histone gene family: core histone H2A/H2B/H3/H4 (PF00125); histone-like transcription factor (CBF/ NF-Y) and archaeal histone (PF00808); C-terminus of histone H2A (PF16211; Supplementary Table S11).
When compared with N. vitripennis, 2 and 82 gene families were expanded and contracted in the clade of Agaonidae which contained four fig wasp species according to Pfam blasting (Supplementary Table S11). Two expanded gene families are pao retrotransposon peptidase and trypsin. Among the contracted gene families, several are chemosensory genes (e.g. CSP, OBP, OR), detoxification genes (e.g. P450, CCE, ecdysteroid kinaselike), ANK, histone-related genes and cuticle protein which are major components of the insect cuticle-associated organs such as integument and wings. 94  Table S12 and Fig. 1).
There were two genes were up-regulated and 60 genes were down-regulated after olfactory treatment (Fig. 4). The up-regulated genes are venom gene CCE and the unknown functional gene, respectively. Go and KEGG pathway enrichment showed down-regulated genes played an important role in signal transduction.

Chemosensory genes
Five chemosensory gene families were blasted by Pfam in 30 species genomes ( Fig. 5a and Supplementary Table S13). We further identified these genes by manual annotation in four fig wasp species. The number of them in each family is the same as that obtained by Pfam except for ORs with one less in each of the three species, V. javana, W. pumila and C. solmsi, which implies that the genes got by blasting Pfam are reliable. OBPs are involved in the first step of olfactory signal transduction, carrying airborne semi-chemicals to the ORs. 95 The  (Supplementary Table S13), and divided into more than 17 groups, of which only 4 groups (1-4) contained genes of all four fig wasp species (Supplementary Fig. S3). Groups 2, 3 and 14 were clustered into one clade with Dmelobp83a and Dmelobp83b from D. melanogaster which are responsible for the detection of pheromones. 77,98 Group 5 and 17 are related and contain five repeats OBPs with three of them highly expressed in V. javana transcriptome ( Supplementary Fig. S3 and Table S14). These two groups clustered into one clade with DmelOBP69a, which has been predicted binding phenylalkylamine and pheromone involved both courtship and olfactory behavior 77,98,99 and may have important functions in fig wasps. According to V. javana transcriptome, one OBP gene was highly expressed in Groups 7,8,9 and 12,respectively (Supplementary  Table S14). These four groups clustered into one clade with DmelOBP56a, DmelOBP56d and DmelOBP56e which can bind to odor compound or pheromones. 77 CSPs is much more conserved across fig wasps, with fewer quantity differences among species and more homology groups contained genes from all four fig wasps (Fig. 5a, Supplementary Table S13 and  Fig. S4). Four CSPs with high expression in V. javana transcriptome and two of them in Groups 1 and 7 clustered with AmelCSP1 and AmelCSP5/AmelCSP2 respectively ( Supplementary Fig. S4 and Table S14). AmelCSP1 has strong affinity to straight chain alcohols and esters, whereas AmelCSP2 has strong binding ability to aromatic compounds. 79 melanogaster, which is a kind of coreceptor (Orco) remains in a complex with ORs in the sensory compartment to enhance the sensitivity of common ORs to the corresponding odor molecules. 100 In Group 2, we found one OR gene having eight repeats in V. javana, whereas in Group 3 there is also one OR gene having two repeats and having four repeats in C. solmsi ( Supplementary  Fig. S5). The serial numbers of most repeats are consecutive which imply they may gained by tandem gene duplications. 101 So, the number of OR in both species is a little higher than that of the other two species, W. pumilae and E. verticillate. We also found some ORs which have a relatively high expression in V. javana transcriptome (Supplementary Fig. S5 and Table S14).
The number of IRs varied little among insect species (Fig. 5a and Supplementary Table S13

Venom genes
When compared with 41 venom genes in N. vitripennis downloaded from NCBI, we found 13, 12, 7 and 12 venom genes in V. javana, W. pumila, E. verticillata and C. solmi, respectively and divided into 10 functional categories (Fig 5b, Supplementary Tables S15 and S16) including proteases and peptidases, protease inhibitors, carbohydrate metabolism, DNA metabolism, glutathione metabolism, esterases, recognition/binding proteins, immune-related proteins, others and unknown. 84   alter host physiology to support the developments of endoparasitoid by cellular degradation, cell signaling, hormone regulation or by inhibiting synthesis of certain proteins. 107,108

Detoxification genes
The genes of GSTs, P450s and CCEs are mainly involved in the detoxification and metabolism of compounds by insects. [109][110][111] In addition, CCE also plays a key role in insect development and behavior, and participates in the degradation of odor molecules and chemical pheromones, insect reproduction and digestion. 112 P450 has a wide range of substrates and diverse catalytic functions, and can also participate in a series of physiological and biochemical reactions in insects except detoxification, such as the regulation of hormones, steroids and fatty acids. 113,114 When compared with N. vitripennis, we observed some shared events of gene losses among four fig wasp species, such as Clades 2, 3, 4, 5 and 7 in P450s family ( Supplementary Fig. S8), Clade 2 in GSTs family ( Supplementary Fig. S9) and three clades in CCE family ( Supplementary Fig. S10), possibly attributable to ancestral divergence from other Chalcidoidea insects. The reductions of three gene families in fig wasps may be related to its strict host specificity (Fig. 5b) and thus most of its lifetime spending in a closed inflorescence. 28 Although the fewer number gene of three detoxification enzymes in A. mellifera has been assumed relating with its specialized eusocial behavior and homeostasis of the nest environment. 115

Olfaction-related pathways
From genome and comparative transcriptome sequencing, we found several DEGs between fig wasp samples stimulated by host odor and the controls, such as adenylyl cyclase, TIAM1, PLCE, adenylate cyclase and CaN, which may play important roles to regulate signal transduction. [116][117][118] in cAMP, cGMP-PKG, Calcim, Ras and Rap1 signal transduction pathways which were believed to be involved in olfactory sensory transduction in both vertebrates and insects ( Fig. 6 and Supplementary Figs. S11-S16). 9,119,120 In insects, the exact olfactory signal transduction pathway is still elusive and must be highly asymmetric, as has also been demonstrated for vertebrate olfactory. 17,121,122 Except the second messengers, the complex between OR and Orco in insects can confer channel activity 16 (Fig. 6).
Most of these genes are down-regulated in odor stimulation samples than the control. The reason is that in contrast to vertebrates, invertebrates have both excitatory and inhibitory responses to odors. 123,124 The response of insects to odor is a rapid process, while our fig wasp samples were exposed to odor for at least half an hour. Therefore, these down-regulated genes should be an inhibitory response to persistent odor stimulation. In order to have a detailed understanding of the induction and transduction of odor stimulation in fig wasps, we should give different periods of odor stimulation to fig wasps, and then compare the expression of these genes. In addition, transcriptome analysis focusing on the specialized olfactory organs (such as antennae) can find more DEGs.

Conclusions
We have provided a high-quality chromosome-level genome assembly of V. javana using a combined Illumina þ PacBio assembly strategy. Basic genomic analyses revealed genome features, phylogenetic position, gene gains and loss and gene evolution of V. javana. We found chemosensory genes in four fig wasp species and compared the number, genetic relationship and try to predict some functions of them according to the predicted function of these genes in A. mellifera and D. melanogaster. We also found several DEGs in transcriptome data of V. javana, which play important roles in some olfaction related pathways. This genome assembly will serve as a useful resource for further research into insect biology, pollinator-host interactions, comparative genomics and coevolution.

Supplementary data
Supplementary data are available at DNARES online.

Funding
This work was supported by the National Natural Science Foundation of China (Grant numbers: 31971568, 32150410364 and 31630008), Province Natural Science Foundation of Guangdong (Grant number: c20140500001306) and The Chinese Academy of Sciences PIFI Fellowship for Visiting Scientists (2022VBA0002).

Conflict of interest
None declared.

Data availability
All genome sequence data of V. javana are available in GWH database of the National Genomics Data Center (NGDC) with accession no. GWHBDGE00000000. The raw genome sequencing data of Illumina and PacBio have deposited in the GSA database of the National Genomics Data Center (NGDC) with accession number of CRA004469. The V. javana transcriptome data are available in the GSA database of the National Genomics Data Center (NGDC) with accession number CRA004554.